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DETAILED ACTION 



Response to Arguments 

The objections to claims 1 and 5 are withdrawn in light of the amendments. The 
objection to the title is maintained. Applicant has amended the title from "IMAGE 
PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE 
PROCESSING PROGRAM, AND RECORDING MEDIUM" to "IMAGE PROCESSING 
APPARATUS". However, this amendment has not made it such that title is clearly 
indicative of the invention to which the claims are directed. The phrase "image 
processing apparatus" is fairly broad and would encompass any application that 
includes some form of image processing. As such, this phrase cannot be an appropriate 
title. An appropriate title would clearly indicate to the reader what Applicant regards as 
his or her invention as reflected in the claims. Applicant's invention is apparently drawn 
to estimating a motion of a predetermined feature point using image processing as is 
indicated in the preamble of claim 1 . 

Applicant's arguments pertaining to the prior art rejections have been fully 
considered but they are not persuasive. Applicant argues that the Heinzmann reference 
teaches away from the claimed invention and particularly the added limitations: "uses a 
perspective transformation". However, in examining the teachings present in page 144, 
column 1 , paragraph 5, one can readily determine that Applicant's conclusions are 
incorrect. Firstly, in the instant paragraph, Heinzmann teaches that "Two different 
transformations may be used for pose estimation from monocular data: perspective or 
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affine transformation. The perspective transformation precisely models the actual 
projection of a 3-D scene to the image plane". Therefore, Heinzmann presents two 
alternative embodiments for performing pose estimation. As such, claim 1 is 
appropriately rejected under 35 USC 102 as being anticipated by the Heinzmann 
reference. Note that the MPEP makes it clear that the "teaches away" argument is not 
germane to a 35 USC 102 rejection (MPEP 2131 .04). This fact alone renders 
Applicant's arguments moot. 

Furthermore, even in considering a 35 USC 103 "obvious" rejection of claim 1, 
Applicant's arguments are moot. MPEP 2123 II states that "Disclosed examples and 
preferred embodiments do not constitute a teaching away from a broader disclosure or 
nonpreferred embodiments ". As indicated above, the Heinzmann reference presents 
two embodiments for pose estimation: perspective and affine transformation. Even if 
one is to assume arguendo that the perspective transformation is a nonpreferred 
embodiment (or broader disclosure). Statements pertaining to why a preferred 
embodiment is preferred do not constitute a teaching away from alternative or 
nonpreferred embodiments (or broader disclosure) since Heinzmann explicitly states 
that both "transformations may be used". 

Also, in considering a 35 USC 103 "obvious" rejection of claim 1, the MPEP 
states that an argument of teaching away is moot when the reference provides 
motivation for the nonpreferred embodiment (MPEP 2144.05 III). Note that Heinzmann 
provides the motivation for using a perspective transformation in stating that "The 
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perspective transformation precisely models the actual projection of a 3-D scene to the 
image plane". 

Lastly, in considering a 35 USC 103 "obvious" rejection of claim 1, note that the 
"teaches away" argument pertains to teaching away from the claimed invention. 
Heinzmann indicates that affine transformation would be more suitable for real-time 
systems because it has simpler calculations and only a twofold ambiguity and since the 
required calculations for the perspective transformation are complex and time 
consuming and can deliver up to a fourfold ambiguity in the estimate of the pose. 
However, the claims do not require that the claimed invention is a real-time system, so 
Applicant's arguments are moot. 

Specification 

The title of the invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 102 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

Claims 1-9 and 23-29 are rejected under 35 U.S.C. 102(b) as being anticipated 
by J. Heinzmann and A. Zelinsky, "3-D facial pose and gaze point estimation using a 
robust real-time tracking paradigm," IEEE Int. Workshop on Automatic Face and 
Gesture Recognition, pp142-147, 1998) (Heinzmann). 
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As per claim 1 , Heinzmann teaches an image processing apparatus for estimating a 
motion of a predetermined feature point of a 3D object from a motion picture of the 3D 
object taken by a monocular camera, comprising (Limitations present only within the 
preamble are not given patentable weight): 

observation vector extracting means for extracting projected coordinates 
of the predetermined feature point onto an image plane, from each of frames of the 
motion picture (Heinzmann: page 142, col 2, para 2: "forwarded to the 2-D model... 
image plane... 2-D image positions of the features"; Fig. 1); 

3D model initializing means for making the observation vector extracting 
means extract from an initial frame of the motion picture, initial projected coordinates in 
a model coordinate arithmetic expression for calculation of model coordinates of the 
predetermined feature point on the basis of a first parameter, a second parameter, and 
the initial projected coordinates (Heinzmann: Fig. 1; abstract: "3-D model... initialize 
the feature tracking": paramaters: abstract: "feature positions... gaze direction... 
head rotation": Fig. 1: "feature positions... relative positions". Fig. 1 shows that 
the projected coordinates are extracted from the 2-D model into the 3-D model, 
page 142, col 2, para 2: "2-D image positions of the features are transferred to a 3- 
D model of the feature locations"; page 144, col 1, para 5 - col 2, para 1 : "affine 
transformation... a good approximation of perspective projection provided the 
depth of the object does not exceed 1/10 of the distance between camera and 
object. This is usually the case in face tracking applications.": Therefore, a 
parameter is the depth of object which is not expected to exceed 1/10 of the 
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distance between the camera and the object; page 144, col 2, paras 2-3: "angle"; 
page 144, col 2, paras 4-5: "theta... orientations"; Fig. 3: "camera coordinates- 
angles"; Fig. 2: "angles"; page 145, col 1, para 3 - col 2, para 1: "distance and 
orientation"; page 145, col 2, para 2: "depth". Fig. 4: shows 9 parameters, page 
146, col 1, para 3 - col 2, para 1: "Figure 4 shows the output of some tracking 
parameter including the rotational angles, the displacement, the gaze direction of 
both eyes and the uncertainty of the face tracking"); and 

motion estimating means for calculating estimates of state variables 
including a third parameter in a motion arithmetic expression for calculation of 
coordinates of the predetermined feature point at a time of photography when a 
processed target frame of the motion picture different from the initial frame was taken, 
from the model coordinates, the first parameter, and the second parameter, and for 
outputting an output value about the motion of the predetermined feature point on the 
basis of the second parameter included in the estimates of the state variables 
(Heinzmann: page 142, col 2, para 2: "The estimated positions of the features 
determine the location within the next image frame of the hardware search 
windows." Note that the state variables include the parameters that were listed 
above: page 142, col 2, para 2: "3-D triplets"; Fig. 1: "3-D pose": output), 

wherein the model coordinate arithmetic expression is based on back 
projection of the monocular camera, the first parameter is a parameter independent of a 
local motion of a portion including the predetermined feature point, and the second 
parameter is a parameter dependent on the local motion of the portion including the 
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predetermined feature point (Heinzmann: page 142, col 2, para 2: "The 3-D model is 
also projected back into the image plane to adapt the constraints in the 2-D 
model."; abstract: "monocular"; page 144, col 1, para 5 - col 2, para 1 : "affine 
transformation... a good approximation of perspective projection provided the 
depth of the object does not exceed 1/10 of the distance between camera and 
object. This is usually the case in face tracking applications.": Therefore, a 
parameter is the depth of object which is not expected to exceed 1/10 of the 
distance between the camera and the object. This parameter is independent of 
local motion. Note that parameters are also listed above, page 145, col 2, para 2: 
"monocular"), and 

wherein the motion estimating means: 
calculates predicted values of the state variables at the time of photography when the 
processed target frame was taken, based on a state transition model (Heinzmann: 
page 143, col 2, para 6: "probabilistic relocation of features based on template 
correlations and a simple 2-D facial model"); 

applies the initial projected coordinates, and the first parameter and the 
second parameter included in the predicted values of the state variables, to the model 
coordinate arithmetic expression to calculate estimates of the model coordinates at the 
time of photography (Heinzmann: Fig. 1 and 3: Note that every frame corresponds 
to a time a photography); 

applies the third parameter in the predicted values of the state variables 
and the estimates of the model coordinates to the motion arithmetic expression to 
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calculate estimates of coordinates of the predetermined feature point at the time of 
photography (Heinzmann: page 146, col 1, para 1-2: third parameter can be 
interpreted to be confidence Figs. 1 and 3. See arguments made above for 
parameters.); 

applies the estimates of the coordinates of the predetermined feature point 
to an observation function based on an observation model of the monocular camera to 
calculate estimates of an observation vector of the predetermined feature point 
(Heinzmann: page 146, col 1, para 1-2. Figs. 1 and 3); 

makes the observation vector extracting means extract the projected 
coordinates of the predetermined feature point from the processed target frame, as the 
observation vector (Heinzmann: page 145, col 1, para 3: "gaze vector"; Figs. 1 and 
3; page 146, col 1, para 2: "gaze vector"); 

filters the predicted values of the state variables by use of the extracted 
observation vector and the estimates of the observation vector to calculate estimates of 
the state variables at the time of photography (Heinzmann: Fig. 1: "Kalman filtering". 
Note that every frame corresponds to a time of photography. As stated above, the 
state variables include the parameters. A coordinate is an observation vector 
originating from the origin in the corresponding coordinate space; page 145, col 
1, para 3: "gaze vector"; Figs. 1 and 3; page 146, col 1, para 2: "gaze vector... 
Intersecting G, with a world model yields the gaze point"); and 

uses a perspective transformation (Heinzmann: See arguments made above. 
Page 144, col 1, para 5: "Two different transformations may be used for pose 
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estimation from monocular data: perspective or affine transformation. The 
perspective transformation precisely models the actual projection of a 3-D scene 
to the image plane"). 

As per claim 2, Heinzmann teaches the image processing apparatus according to 
Claim 1 , wherein the first parameter is a static parameter to converge at a specific 
value, and wherein the second parameter is a dynamic parameter to vary with the 
motion of the portion including the predetermined feature point (Heinzmann: See 
arguments made for rejection claim 1: The static parameter can be interpreted to 
be the length (or depth) of the gaze vector that converges to a specific gaze point 
(page 146, col 1, para 2).The second dynamic value is the angle or orientation that 
varies over time along with the motion). 

As per claim 3, Heinzmann teaches the image processing apparatus according to 
Claim 2, wherein the static parameter is a depth from the image plane to the 
predetermined feature point (Heinzmann: See arguments made for rejection claim 1, 
2: The depth of the feature from the image plane is considered as a parameter.). 

As per claim 4, Heinzmann teaches the image processing apparatus according 
to Claim 2, wherein the dynamic parameter is a rotation parameter for specifying a 
rotation motion of the portion including the predetermined feature point (Heinzmann: 
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See arguments made for rejection claim 1, 2: The rotation is considered as a 
parameter). 

As per claim 5, Heinzmann teaches the image processing apparatus according to 
Claim 4, wherein the rotation parameter is an angle made by a vector from an origin to 
the predetermined feature point, relative to two coordinate axes in a coordinate system 
whose origin is at a center of the portion including the predetermined feature point 
(Heinzmann: See arguments made for rejection claim 1: page 146, col 1: "eye 
orientation... alpha_x, alpha_y... origin is located between the eyes"). 

As per claim 6, Heinzmann teaches the image processing apparatus according to 
Claim 1 , wherein the first parameter is a rigid parameter, and wherein the second 
parameter is a non-rigid parameter (Heinzmann: See arguments made for rejection 
claim 1, 2: The depth is the rigid parameter, and the angle/orientation is the non- 
rigid-parameter. Also, affine and perspective transformations are non-rigid 
transformation, but the depth would not be affected by the transformations). 

As per claim 7, Heinzmann teaches the image processing apparatus according to 
Claim 6, wherein the rigid parameter is a depth from the image plane to the model 
coordinates (Heinzmann: See arguments made for rejection claim 1, 6.). 
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As per claim 8, Heinzmann teaches the image processing apparatus according 
to Claim 6, wherein the non-rigid parameter is a change amount about a position 
change of the predetermined feature point due to the motion of the portion including the 
predetermined feature point (Heinzmann: See arguments made for rejection claim 1, 
5.) 

As per claim 9, Heinzmann teaches the image processing apparatus according 
to Claim 1 , wherein the motion model is based on rotation and translation 
motions of the 3D object, and wherein the third parameter is a translation parameter for 
specifying a translation amount of the 3D object and a rotation parameter for specifying 
a rotation amount of the 3D object (Heinzmann: See arguments made for rejection 
claim 1, 2, and 5: Fig. 4: "DispX... DispY": translation; Fig. 1: "template tracking" 
Template tracking or matching accounts for in-plane translations. Fig. 3: "camera 
coordinates... angles"; Fig. 1). 

As per claim 23, Heinzmann teaches the image processing apparatus according 
to Claim 1 , wherein a 3D structure of a center of a pupil on a facial picture is 
defined by a static parameter and a dynamic parameter, and wherein the a gaze is 
determined by estimating the static parameter and the dynamic parameter 
(Heinzmann: See arguments made for rejection claim 1, 2, 5, 9). 
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As per claim 24, Heinzmann teaches the image processing apparatus according 
to Claim 23, wherein the static parameter is a depth of the pupil in a camera coordinate 
system (Heinzmann: See arguments made for rejection claim 1, 2, 5, 9). 

As per claim 25, Heinzmann teaches the image processing apparatus according 
to Claim 23, wherein the dynamic parameter is a rotation parameter of an eyeball 
(Heinzmann: See arguments made for rejection claim 1, 2, 5, 9). 

As per claim 26, Heinzmann teaches the image processing apparatus according 
to Claim 25, wherein the rotation parameter of the eyeball has two degrees of freedom 
to permit rotations with respect to two coordinate axes in an eyeball coordinate system 
(Heinzmann: See arguments made for rejection claim 1, 2, 5, 9: alpha_x, alpha_y). 

As per claim 27, Heinzmann teaches the image processing apparatus according 
to Claim 1 , wherein a 3D structure of the 3D object on the a picture is defined by a 
rigid parameter and a non-rigid parameter and wherein the motion of the 3D object is 
determined by estimating the rigid parameter and the non-rigid parameter (Heinzmann: 
See arguments made for rejection claim 1, 2, 5, 6, 9). 

As per claim 28, Heinzmann teaches the image processing apparatus according 
to Claim 27, wherein the rigid parameter is a depth of a feature point of the 3D object in 
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a model coordinate system (Heinzmann: See arguments made for rejection claim 1, 
2, 5, 6, 9). 

As per claim 29, Heinzmann teaches the image processing apparatus according 
to Claim 27, wherein the non-rigid parameter is a change amount of a feature point of 
the 3D object in a model coordinate system (Heinzmann: See arguments made for 
rejection claim 1, 2, 5, 6, 9). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claim 1-10 and 23-29 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over J. Heinzmann and A. Zelinsky, "3-D facial pose and gaze point estimation using a 
robust real-time tracking paradigm," IEEE Int. Workshop on Automatic Face and 
Gesture Recognition, pp142-147, 1998) (Heinzmann) in view of Park, K. R., et al., 
"Gaze position detection by computing the three dimensional facial positions and 
motions," Pattern Recognition, Vol. 35, No. 1 1 , Nov. 2002, pp. 2559-2569 (Park). 

Arguments made in rejecting claims 1-9 and 23-29 under 35 USC 103 are analogous to 
arguments for rejecting claim claims 1-9 and 23-29 under 35 USC 102 made above. 
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Note that Park teaches using a perspective transformation (Park: page 2563, col 1, 
para 1: "perspective camera model"; page 2568, col 1, para 3: "perspective 
transformation"). 

Thus, it would have been obvious for one of ordinary skill in the art at the time the 
invention was made to implement the teachings of Park into Heinzmann since 
Heinzmann suggests a system for determining face and gaze positions using a 
perspective transformation in general and Park suggests the beneficial use of a system 
for determining face and gaze positions using a perspective transformation as to " 
obtain the exact 3D positions of the initial feature points" (Park: page 2568, col 1, para 
3) in the analogous art of image processing. It would have been obvious for one of 
ordinary skill in the art at the time the invention was made to implement the teachings of 
Park into Heinzmann since Heinzmann suggests the motivation "precisely models the 
actual projection" (Heinzmann: page 144, col 1, para 5). Therefore, both the Heinzmann 
and Park references highlight that the perspective transformation has the benefit of 
precision. Furthermore, one of ordinary skill in the art at the time the invention was 
made could have combined the elements as claimed by known methods and, in 
combination, each component functions the same as it does separately. One of ordinary 
skill in the art at the time the invention was made would have recognized that the results 
of the combination would be predictable. 
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As per claim 10, Heinzmann teaches the image processing apparatus according 
to Claim 1, wherein the motion estimating means applies Kalman filtering as 
said filtering (Heinzmann: See arguments made for rejecting claim 1). Heinzmann 
does not teach extended Kalman filtering. 

Park teaches extended Kalman filtering (Park: page 2564, col 1, para 4: "extended 
Kalman"). 

Thus, it would have been obvious for one of ordinary skill in the art at the time the 
invention was made to implement the teachings of Park into Heinzmann since 
Heinzmann suggests a system for determining face and gaze positions using Kalman 
filtering in general and Park suggests the beneficial use of a system for determining 
face and gaze positions using extended Kalman filtering as to in the analogous art of 
image processing. It would have been obvious for one of ordinary skill in the art at the 
time the invention was made to implement the teachings of Park into Heinzmann since it 
is well known that the extended Kalman filter is applicable to nonlinear problems 
whereas the Kalman filter is not. Therefore, one can apply the extended Kalman filter in 
order to obtain a more robust system. Furthermore, one of ordinary skill in the art at the 
time the invention was made could have combined the elements as claimed by known 
methods and, in combination, each component functions the same as it does 
separately. One of ordinary skill in the art at the time the invention was made would 
have recognized that the results of the combination would be predictable. 
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Conclusion 

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy 
as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Atiba Fitzpatrick whose telephone number is (571) 270- 
5255. The examiner can normally be reached on M-F 10:00am-6pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Samir Ahmed can be reached on (571)272-7413. The fax phone number for 
Examiner Atiba Fitzpatrick is 571-270-6255. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published 
applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business 
Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 
Customer Service Representative or access to the automated information system, call 
800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
/Bhavesh M Mehta/ 

Supervisory Patent Examiner, Art Unit 2624 
Atiba Fitzpatrick 
/A. O. F./ 

Examiner, Art Unit 2624 



